【cherry-pick】upgrade triton moe config. by xuanyuanminzheng · Pull Request #7979 · PaddlePaddle/FastDeploy

xuanyuanminzheng · 2026-06-02T09:54:24Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot · 2026-06-02T10:34:49Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-03 15:36:45

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: a45e569 | Merge base: ac24fcc (branch: release/2.6)

1 Required任务 : 8/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
35(0)	35	30	5	0	0	0

任务	错误类型	置信度	日志
`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	PR问题：测试断言值未适配新GPU配置表	高	Job
`Approval`	需要 Approval	-	Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题（置信度: 高）

错误类型: PR问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例:

用例	错误摘要
`tests/layers/test_fused_moe_triton_backend.py::TestTritonMoEMethod::test_get_default_config_num_stages`	断言 `cfg32["num_stages"] == 4` 失败，实际值为 5

关键日志:

def test_get_default_config_num_stages(self):
    """M<=32 → num_stages=4; M>32 → num_stages=3."""
    method = backend.TritonMoEMethod()
    cfg32 = method._get_default_config(M=32, E=8)
>       assert cfg32["num_stages"] == 4
E       assert 5 == 4
tests/layers/test_fused_moe_triton_backend.py:1055: AssertionError

根因摘要: 测试断言值与新GPU配置表不符

PR 将 _get_default_config 从通用 heuristic（M<=32 固定返回 num_stages=4）改为按GPU架构（SM90/SM100）查表。CI 运行在 H100（SM90） 环境，_SM90_CONFIGS[32]["num_stages"] = 5，而测试用例文档注释和断言仍按旧逻辑写死 num_stages=4，导致断言失败。

修复建议:

更新 tests/layers/test_fused_moe_triton_backend.py 第1055行，将断言从 == 4 改为适配SM90的 == 5，或改为基于 get_sm_version() 动态获取期望值，例如：
```
expected = 5 if get_sm_version() >= 90 else 4
assert cfg32["num_stages"] == expected
```

关联变更: fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py，_get_default_config 方法改为按SM版本查表（新增 _SM100_CONFIGS、_SM90_CONFIGS）

🔴 Approval — 需要 Approval（置信度: -）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

codecov-commenter · 2026-06-02T11:19:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (release/2.6@ac24fcc). Learn more about missing BASE report.

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7979   +/-   ##
==============================================
  Coverage               ?   72.14%           
==============================================
  Files                  ?      386           
  Lines                  ?    55654           
  Branches               ?     8740           
==============================================
  Hits                   ?    40152           
  Misses                 ?    12668           
  Partials               ?     2834

Flag	Coverage Δ
GPU	`72.14% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-03 20:01:04

📋 Review 摘要

PR 概述：为 Triton MoE backend 的 _get_default_config 新增 SM100 (B200) 预调优 tile config 查找表，通过最近邻匹配代替原有启发式规则；测试同步新增 SM100 路径覆盖并修复 GPU-model-independent 依赖问题。
变更范围：model_executor/layers/moe/、tests/layers/
影响面 Tag：[OP]

问题

未发现阻塞性问题。PR 规范问题在下面章节报，不要在这里重复。

历史 Findings 修复情况

Finding	问题	状态
F1	热路径重复创建大型字典对象	⚠️ 仍存在
F2	SM80 (A100) 回退到 SM90 (H100) 配置的合理性	⚠️ 仍存在

F1 说明：_SM100_CONFIGS（18 个条目）仍在 _get_default_config 函数体内定义，每次调用都会重建字典对象。SM100 路径已被本次 PR 正式启用，建议将其提升为类属性或模块常量，避免推理热路径上的重复分配。

F2 说明：SM80（A100）和 SM90（H100）仍共享同一套 vLLM 启发式参数，未为 A100 单独调优。作者可在 PR 中补充说明是否有意为之（如 A100 上影响可接受）。

📝 PR 规范检查

PR 标题缺少官方 Tag 前缀，且目标分支为 release/2.6 但未使用 [Cherry-Pick] 格式；PR 描述各 section 未填写实质内容。

标题建议（可直接复制）：

[Cherry-Pick][Optimization] GPU-aware triton MoE tile config for SM100 (B200)(#原始PR号)

PR 描述建议（点击展开，可直接复制）

## Motivation

将 Triton MoE 的 `_get_default_config` 从纯启发式规则替换为 GPU 架构感知的预调优查找表：SM100 (B200) 使用 SGLang 针对 E=64, N=1856 实测的 18 档配置，通过最近邻匹配选取最优 tile size，提升 B200 上 MoE 层的 Triton 计算性能。

## Modifications

- `fused_moe_triton_backend.py`：`_get_default_config` 新增 `get_sm_version() >= 100` 分支，内置 SM100 查找表（M=1~4096，18 个采样点），采用 `min(..., key=lambda x: abs(x - M))` 最近邻选取
- `_get_default_config`：其余 GPU 保留原 vLLM 启发式路径不变
- `test_fused_moe_triton_backend.py`：全部 `_get_default_config` 相关测试增加 `_mock_sm90` / `_mock_sm100` monkeypatch，消除对真实 GPU 型号的依赖；新增 5 个 SM100 路径测试；`test_apply_kernel_even_ks_*` 和 `test_apply_large_batch_config` 修复 GPU-model 相关的硬编码假设

## Usage or Command

N/A

## Accuracy Tests

N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本次变更结构清晰，SM100 lookup table 数值来源有据可查，测试覆盖完善且正确修复了 GPU-model 相关的测试硬编码。两个历史遗留问题（热路径字典创建、SM80 配置合理性）仍存在，建议作者跟进说明或修复后合入。

xuanyuanminzheng changed the title ~~Feature ep triton moe~~ 【cherry-pick】upgrade triton moe config. Jun 2, 2026

This comment was marked as outdated.

Sign in to view

upgrade triton moe config in sm100.

a816e85

xuanyuanminzheng force-pushed the feature-ep-triton-moe branch from a45e569 to a816e85 Compare June 3, 2026 11:41

PaddlePaddle-bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【cherry-pick】upgrade triton moe config.#7979

【cherry-pick】upgrade triton moe config.#7979
xuanyuanminzheng wants to merge 1 commit into
PaddlePaddle:release/2.6from
xuanyuanminzheng:feature-ep-triton-moe

xuanyuanminzheng commented Jun 2, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 2, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xuanyuanminzheng commented Jun 2, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 Required任务 : 8/10 通过

2 失败详情

Uh oh!

codecov-commenter commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

历史 Findings 修复情况

📝 PR 规范检查

总体评价

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PaddlePaddle-bot commented Jun 2, 2026 •

edited

Loading

codecov-commenter commented Jun 2, 2026 •

edited

Loading